Optimal Index Policies for Mdps with a Constraint
نویسنده
چکیده
Many controlled queueing systems possess simple index-type optimal policies, when discounted, average or finite-time cost criteria are considered. This structural results makes the computation of optimal policies relatively simple. Unfortunately, for constrained optimization problems, the index structure of the optimal policies is in general not preserved. As a result, computing optimal policies for the constrained problem appears to be a much more difficult task. We provide a framework under which the solution of the constrained optimization problem uses the same index policies as the non-constrained problem. The method is applicable to the discrete-time IClimov system, which is shown to be equivalent to the open bandit problem.
منابع مشابه
Safe Stochastic Planning: Planning to Avoid Fatal States
Markov decision processes (MDPs) are applied as a standard model in Artificial Intelligence planning. MDPs are used to construct optimal or near optimal policies or plans. One area that is often missing from discussions of planning under stochastic environment is how MDPs handle safety constraints expressed as probability of reaching threat states. We introduce a method for finding a value opti...
متن کاملRisk-Constrained Reinforcement Learning with Percentile Risk Criteria
In many sequential decision-making problems one is interested in minimizing an expected cumulative cost while taking into account risk, i.e., increased awareness of events of small probability and high consequences. Accordingly, the objective of this paper is to present efficient reinforcement learning algorithms for risk-constrained Markov decision processes (MDPs), where risk is represented v...
متن کاملTotal Expected Discounted Reward MDPs: Existence of Optimal Policies
This article describes the results on the existence of optimal and nearly optimal policies for Markov Decision Processes (MDPs) with total expected discounted rewards. The problem of optimization of total expected discounted rewards for MDPs is also known under the name of discounted dynamic programming.
متن کاملAsymptotic properties of constrained Markov Decision Processes
We present in this paper several asymptotic properties of constrained Markov Decision Processes (MDPs) with a countable state space. We treat both the discounted and the expected average cost, with unbounded cost. We are interested in (1) the convergence of nite horizon MDPs to the innnite horizon MDP, (2) convergence of MDPs with a truncated state space to the problem with innnite state space,...
متن کاملMultiple-Goal Reinforcement Learning with Modular Sarsa(O)
We present a new algorithm, GM-Sarsa(O), for finding approximate solutions to multiple-goal reinforcement learning problems that are modeled as composite Markov decision processes. According to our formulation different sub-goals are modeled as MDPs that are coupled by the requirement that they share actions. Existing reinforcement learning algorithms address similar problem formulations by fir...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014